AITopics | particular article

Collaborating Authors

particular article

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Extracting Memorized Training Data via Decomposition

Su, Ellen, Vellore, Anu, Chang, Amy, Mura, Raffaele, Nelson, Blaine, Kassianik, Paul, Karbasi, Amin

arXiv.org Artificial IntelligenceSep-18-2024

The widespread use of Large Language Models (LLMs) in society creates new information security challenges for developers, organizations, and end-users alike. LLMs are trained on large volumes of data, and their susceptibility to reveal the exact contents of the source training datasets poses security and safety risks. Although current alignment procedures restrict common risky behaviors, they do not completely prevent LLMs from leaking data. Prior work demonstrated that LLMs may be tricked into divulging training data by using out-of-distribution queries or adversarial techniques. In this paper, we demonstrate a simple, query-based decompositional method to extract news articles from two frontier LLMs. We use instruction decomposition techniques to incrementally extract fragments of training data. Out of 3723 New York Times articles, we extract at least one verbatim sentence from 73 articles, and over 20% of verbatim sentences from 6 articles. Our analysis demonstrates that this method successfully induces the LLM to generate texts that are reliable reproductions of news articles, meaning that they likely originate from the source training dataset. This method is simple, generalizable, and does not fine-tune or change the production model. If replicable at scale, this training data extraction methodology could expose new LLM security and safety vulnerabilities, including privacy risks and unauthorized data leaks. These implications require careful consideration from model development to its end-use.

assistant response, extracting memorized training data, particular article, (12 more...)

arXiv.org Artificial Intelligence

2409.12367

Country:

Asia > Russia (0.46)
Europe > United Kingdom (0.28)
North America > United States > New York (0.04)
(5 more...)

Genre:

Research Report (1.00)
Personal (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Regional Government > Europe Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Nikola Tesla's Amazing Predictions for the 21st Century

#artificialintelligenceAug-22-2016, 22:01:01 GMT

In the 1930s journalists from publications like the New York Times and Time magazine would regularly visit Nikola Tesla at his home on the 20th floor of the Hotel Governor Clinton in Manhattan. There the elderly Tesla would regale them with stories of his early days as an inventor and often opined about what was in store for the future. Last year we looked at Tesla's prediction that eugenics and the forced sterilization of criminals and other supposed undesirables would somehow purify the human race by the year 2100. Today we have more from that particular article which appeared in the February 9, 1935, issue of Liberty magazine. The article is unique because it wasn't conducted as a simple interview like so many of Tesla's other media appearances from this time, but rather is credited as "by Nikola Tesla, as told to George Sylvester Viereck."

artificial intelligence, nikola tesla, tesla, (16 more...)

#artificialintelligence

Country:

North America > United States > New York (0.06)
Asia > India (0.05)
Asia > China (0.05)

Industry: Government > Regional Government > North America Government > United States Government (0.97)

Technology: Information Technology > Artificial Intelligence > Robots (0.31)

Add feedback

Supervised Learning for Document Classification with Scikit-Learn - QuantStart

#artificialintelligenceJun-1-2016, 07:11:56 GMT

This is the first article in what will become a set of tutorials on how to carry out natural language document classification, for the purposes of sentiment analysis and, ultimately, automated trade filter or signal generation. This particular article will make use of Support Vector Machines (SVM) to classify text documents into mutually exclusive groups. Since this is the first article written in 2015, I feel it is now time to move on from Python 2.7.x and make use of the latest 3.4.x Hence all code in this article will be written with 3.4.x in mind. There are a significant number of steps to carry out between viewing a text document on a web site, say, and using its content as an input to an automated trading strategy to generate trade filters or signals. In this particular article we will avoid discussion of how to download multiple articles from external sources and make use of a given dataset that already comes with its own provided labels. This will allow us to concentrate on the implementation of the "classification pipeline", rather than spend a substantial amount of time obtaining and tagging documents. In subsequent articles in this series we will make use of Python libraries, such as ScraPy and BeautifulSoup to automatically obtain many web-based articles and effectively extract their text-based data from the HTML.

classifier, machine learning, natural language, (18 more...)

#artificialintelligence

Country:

Asia > Thailand (0.04)
Asia > Japan (0.04)

Industry: Banking & Finance > Trading (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.59)

Add feedback